Research on Tibetan Text Orientation Identification
نویسندگان
چکیده
In recent years, Minority languages in China are widely used on the computer and network. But now there is no effective public opinion analysis system of the minorities overall attitude of the masses of the hot events or topics. In this study, we research on Tibetan topic orientation recognition. First, according to the Tibetan context and life characteristics, combined with a set of emotional words in Hownet, the Tibetan emotional word dictionary is built, and then by the Tibetan word semantic similarity calculation method we extend this dictionary to get rich emotional word set. We also propose a method that the sentence orientation is determined by the orientation of words in this sentence and the orientation of text is determined by the orientation of sentences in this text. By our research the Tibetan hotspot information can be rapidly detected and found and then the public opinion tend can be track quickly. It is benefit for positive guidance of public opinion.
منابع مشابه
Tibetan Unknown Word Identification from News Corpora for Supporting Lexicon-based Tibetan Word Segmentation
In Tibetan, as words are written consecutively without delimiters, finding unknown word boundary is difficult. This paper presents a hybrid approach for Tibetan unknown word identification for offline corpus processing. Firstly, Tibetan named entity is preprocessed based on natural annotation. Secondly, other Tibetan unknown words are extracted from word segmentation fragments using MTC, the co...
متن کاملResearch on Tibetan Automatic Word Segmentation
This paper researches on Tibetan automatic word segmentation. We focus on three key technologies of Tibetan automatic word segmentation: (1) a Tibetan automatic word segmentation approach is proposed, which is taking the advantage of case-auxiliary words and continuous feature. (2) a resolution method of overlapping ambiguity in Tibetan word segmentation is proposed, which is based on forward-b...
متن کاملTibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation
Tibetan word segmentation is essential for Tibetan information processing. People mainly use the basic machine matching method which is based on dictionary to segment Tibetan words at present, because there is no segmented Tibetan corpus which can be used for training in Tibetan word segmentation. But the method based on dictionary is not fit to Tibetan number identification. This paper studies...
متن کاملTibetan Syllable-Based Functional Chunk Boundary Identification
Tibetan syntactic functional chunk parsing is aimed at identifying syntactic constituents of Tibetan sentences. In this paper, based on the Tibetan syntactic functional chunk description system, we propose a method which puts syllables in groups instead of word segmentation and tagging and use the Conditional Random Fields (CRFs) to identify the functional chunk boundary of a sentence. Accordin...
متن کاملTibetan Multi-word Expressions Identification Framework Based on News Corpora
This paper presents an identification framework for extracting Tibetan multi-word expressions. The framework includes two phases. In the first phase, sentences are segmented and high-frequency word-based n-grams are extracted using Nagao’s N-gram statistical algorithm and Statistical Substring Reduction Algorithm. In the second phase, the Tibetan MWEs are identified by the proposed framework wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 9 شماره
صفحات -
تاریخ انتشار 2014